The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS

نویسندگان

  • Vuk Ercegovac
  • David J. DeWitt
  • Raghu Ramakrishnan
چکیده

We introduce a benchmark called TEXTURE (TEXT Under RElations) to measure the relative strengths and weaknesses of combining text processing with a relational workload in an RDBMS. While the well-known TREC benchmarks focus on quality, we focus on efficiency. TEXTURE is a micro-benchmark for query workloads, and considers two central text support issues that previous benchmarks did not: (1) queries with relevance ranking, rather than those that just compute all answers, and (2) a richer mix of text and relational processing, reflecting the trend toward seamless integration. In developing this benchmark, we had to address the problem of generating large text collections that reflected the (performance) characteristics of a given “seed” collection; this is essential for a controlled study of specific data characteristics and their effects on performance. In addition to presenting the benchmark, with performance numbers for three commercial DBMSs, we present and validate a synthetic generator for populating text fields.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XMach-1: A Benchmark for XML Data Management

We propose a scaleable multi-user benchmark called XMach-1 (XML Data Management benchmark) for evaluating the performance of XML data management systems. It is based on a web application and considers different types of XML data, in particular text documents, schema-less data and structured data. We specify the structure of the benchmark database and the generation of its contents. Furthermore,...

متن کامل

Implementing Geospatial Operations in an Object-Relational Database System

Over the last decade the need to implement functions into a DBMS that are application-speciic has increased. For this reason today most object-relational DBMS (ORDBMS) provide features that allow the user to include application-speciic functions into the DBMS for their execution within database queries. This paper reports on an implementation eeort to include spatial operations into an ORDBMS a...

متن کامل

Evaluating Join Performance on Relational Database Systems

The join operator is fundamental in relational database systems. Evaluating join queries on large tables is challenging because records need to be efficiently matched based on a given key. In this work, we analyze join queries in SQL with large tables in which a foreign key may be null, invalid or valid, given a referential integrity constraint. We conduct an extensive join performance evaluati...

متن کامل

Scalable Persisting and Querying of Streaming Data by Utilizing a NoSQL Data Store

Relational databases provide technology for scalable queries over persistent data. In many application scenarios a problem with conventional relational database technology is that loading large data logs produced at high rates into a database management system (DBMS) may not be fast enough, because of the high cost of indexing and converting data during loading. As an alternative a modern index...

متن کامل

Optimizing Unbound-property Queries to RDF Views of Relational Databases

SAQ (Semantic Archive and Query) is a system for querying and long-term preservation of relational data in terms of RDF. In SAQ relational data in a back-end DBMS is exposed as an RDF view, called the RD-view. SAQ can process arbitrary SPARQL queries to the RD-view. In addition long-term preservation as RDF of selected parts of a relational database is specified by SPARQL queries to the RD-view...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005